Remove python-gil requirement and enable free-threaded Python#2250
Remove python-gil requirement and enable free-threaded Python#2250ndgrigorian wants to merge 24 commits into
python-gil requirement and enable free-threaded Python#2250Conversation
dcac1f6 to
e450664
Compare
085aece to
5dda977
Compare
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/2250/index.html |
82e9a5e to
a1d36ce
Compare
|
@antonwolfy @vlad-perevezentsev |
6a5046b to
7d8bbc8
Compare
b610ee3 to
dd74214
Compare
1b70ce6 to
1bc9c3d
Compare
free-threaded builds use a new GC that skips PyGC_Head, and this seems to cause some objects to change in size by ~16 bytes
1bc9c3d to
f1f0f76
Compare
f1f0f76 to
30a50e5
Compare
b37ca36 to
732f37d
Compare
dpctl_capi singleton intiailization could cause deadlocks with updated order manager
732f37d to
df6f452
Compare
vlad-perevezentsev
left a comment
There was a problem hiding this comment.
No more comments from my side
LGTM
Thank you @ndgrigorian
| // acquire gil to safely call into Python C API | ||
| py::gil_scoped_acquire acquire; | ||
|
|
||
| capi_ptr = new dpctl_capi(); |
There was a problem hiding this comment.
Does switching from a function-local static to a heap pointer means ~dpctl_capi() and the whole Deleter finalization guard (lines 170-186) are now dead code — the held Python objects leak?
I guess leaking a process singleton is a defensible no-GIL trade-off, but it bypasses machinery that was deliberately written for interpreter finalization.
Do we need to add an explicit comment stating it's intentional?
| self._state = _OrderManager(16) | ||
|
|
||
| def __dealloc__(self): | ||
| def __del__(self): |
There was a problem hiding this comment.
__dealloc__ was dead on a pure-Python class; renaming to __del__ makes it actually call SyclEvent.wait_for(...).
Under the free-threading, finalizers run on arbitrary threads and possibly during interpreter shutdown — combined with the existing weakref.finalize path, two finalizers now wait on events.
Confirm this is intended and that it's guarded against shutdown.
| module. | ||
| """ | ||
|
|
||
| import concurrent.futures |
There was a problem hiding this comment.
SequentialOrderManager's map is threading.local(), so each worker thread gets a fresh, thread-private order manager.
The independent_threads/fork_join cases pass only because each thread does its own blocking wait() — the order manager is effectively a no-op for cross-thread coordination. Only explicit_event_passing does genuine cross-thread handoff (via explicit add_event_pair).
It seems we need at least to reframe the docstrings to say each thread keeps its own sequential order and cross-thread deps must be passed explicitly.
| @@ -0,0 +1,199 @@ | |||
| # Copyright 2020-2025 Intel Corporation | |||
There was a problem hiding this comment.
Seems we haven't to use the range:
| # Copyright 2020-2025 Intel Corporation | |
| # Copyright 2026 Intel Corporation |
|
|
||
| [project.optional-dependencies] | ||
| coverage = ["Cython", "pytest", "pytest-cov", "coverage", "tomli"] | ||
| coverage = ["Cython", "pytest", "coverage", "tomli"] |
There was a problem hiding this comment.
What is about the --cov-report term-missing below?
Actually ini.options has to be renamed to ini_options and it's only the reason why pytest is not failing with that change (removed pytest-cov).
| }; // namespace | ||
|
|
||
| PYBIND11_MODULE(_device_queries, m) | ||
| PYBIND11_MODULE(_device_queries, m, py::mod_gil_not_used()) |
There was a problem hiding this comment.
Should we update the example in docs/doc_sources/api_reference/dpctl_pybind11.rst with the same py::mod_gil_not_used()?
|
|
||
| #include <algorithm> | ||
| #include <cstddef> | ||
| #include <mutex> |
There was a problem hiding this comment.
Missing <utility> include
| self.__device_map__[key] = dev | ||
| return dev | ||
|
|
||
| def _update_map(self, dev_map): |
There was a problem hiding this comment.
_update_map is public + unlocked — safe today (only called on the not-yet-shared _copy), but a foot-gun if ever called on the live global. We probably might need to consider locking it internally or clear documenting.
There was a problem hiding this comment.
Or alternatively to revert back to cdef at least to make it inaccessible from the python code, i.e. to have more control when the method is called.
| def _update_map(self, dev_map): | ||
| self.__device_map__.update(dev_map) | ||
|
|
||
| def __copy__(self): |
There was a problem hiding this comment.
Seems not covered by any test
| self.__device_queue_map__[ctx_dev] = q | ||
| return q | ||
|
|
||
| def _update_map(self, dev_queue_map): |
There was a problem hiding this comment.
Same comment to _update_map
This PR removes the required
python-gildependency from the dpctl conda package workflow and enables free-threaded Python in extension modulesAdjustments are made to the
SequentialOrderManagerclass such that the class is safe in free-threaded, including mutexes in the C++ class as a fall-back in case of (not recommended) simultaneous access to its members and methodsSequentialOrderManager now maintains thread-local storage for individual queue-to-manager-maps, such that each thread has its own manager per queue.
Queue and device caching, meanwhile, are global, to create a concept of default queues and devices that allows operations in extensions like dpnp (which rely on queues being the same for compute follows data) to operate on data passed between threads without copy.
In the futue, per-thread-queues and devices may prove more efficient, in which case, extensions will be asked to be made more robust (checking that context and device are the same, not using queue as a shortcut).
This PR builds on top of work already done removing the tensor submodule, which is pending migration to dpnp